Network-Aware Task Assignment for MapReduce Applications in Shared Clusters

نویسندگان

Fei Xu

Fangming Liu

Peng Yin

Hai Jin

چکیده

Running MapReduce applications in shared clusters is becoming increasingly compelling to improve the cluster utilization. However, the network sharing across diverse applications can make the network bandwidth for MapReduce applications constrained and heterogeneous, which inevitably increases the severity of network hotspots in racks, and makes the existing task assignment policies that focus on the data locality no longer effective. To deal with this issue, this paper proposes a lightweight networkaware task assignment strategy for MapReduce applications in shared clusters. By analyzing the relationship between job completion time and the assignment of both map and reduce tasks across racks, it devises and integrates two simple yet effective greedy heuristics, which can minimize the completion time of map phase and reduce phase, respectively. With extensive prototype experiments on a 12-node 3-rack MapReduce cluster and complementary large-scale simulations driven by Facebook job traces, we demonstrate that our network-aware strategy can shorten the completion time of MapReduce jobs, in comparison to the state-of-the-art task assignment strategies, yet with an acceptable computational overhead.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Boosting MapReduce with Network-Aware Task Assignment

Running MapReduce in a shared cluster has become a recent trend to process large-scale data analytics applications while improving the cluster utilization. However, the network sharing among various applications can lead to constrained and heterogeneous network bandwidth available for MapReduce applications. This further increases the severity of network hotspots in racks, and makes existing ta...

متن کامل

Survey on Task Assignment Techniques in Hadoop

MapReduce is an implementation for processing large scale data parallelly. Actual benefits of MapReduce occur when this framework is implemented in large scale, shared nothing cluster. MapReduce framework abstracts the complexity of running distributed data processing across multiple nodes in cluster. Hadoop is open source implementation of MapReduce framework, which processes the vast amount o...

متن کامل

HaLoop: Efficient Iterative Data Processing on Large Clusters

With the recent growth of the demand for large-scale data mining and data analysis, both industry and academia have begun to design highly scalable data-intensive computing platforms. MapReduce and Dryad are two scalable frameworks for distributed data-intensive DAG(directed acyclic graph) applications. However, they do not have built-in support for iterative programs, which is a common approac...

متن کامل

Communication-Aware Traffic Stream Optimization for Virtual Machine Placement in Cloud Datacenters with VL2 Topology

By pervasiveness of cloud computing, a colossal amount of applications from gigantic organizations increasingly tend to rely on cloud services. These demands caused a great number of applications in form of couple of virtual machines (VMs) requests to be executed on data centers’ servers. Some of applications are as big as not possible to be processed upon a single VM. Also, there exists severa...

متن کامل

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Network-Aware Task Assignment for MapReduce Applications in Shared Clusters

نویسندگان

چکیده

منابع مشابه

Boosting MapReduce with Network-Aware Task Assignment

Survey on Task Assignment Techniques in Hadoop

HaLoop: Efficient Iterative Data Processing on Large Clusters

Communication-Aware Traffic Stream Optimization for Virtual Machine Placement in Cloud Datacenters with VL2 Topology

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

عنوان ژورنال:

اشتراک گذاری